A Source Coding Approach to Classification by Vector Quantization and the Principle of Minimum Description Length

نویسنده

  • Jia Li
چکیده

An algorithm for supervised classi cation using vector quantization and entropy coding is presented. The classi cation rule is formed from a set of training data f(Xi; Yi)g n i=1, which are independent samples from a joint distribution PXY . Based on the principle of Minimum Description Length (MDL), a statistical model that approximates the distribution PXY ought to enable e cient coding of X and Y . On the other hand, we expect a system that encodes (X;Y ) e ciently to provide ample information on the distribution PXY . This information can then be used to classify X, i.e., to predict the corresponding Y based on X. To encode both X and Y , a two-stage vector quantizer is applied to X and a Hu man code is formed for Y conditioned on each quantized value of X. The optimization of the encoder is equivalent to the design of a vector quantizer with an objective function re ecting the joint penalty of quantization error and misclassi cation rate. This vector quantizer provides an estimation of the conditional distribution of Y given X, which in turn yields an approximation to the Bayes classi cation rule. This algorithm, namely Discriminant Vector Quantization (DVQ), is compared with Learning Vector Quantization (LVQ) and CART on a number of data sets. DVQ outperforms the other two on several data sets. The relation between DVQ, density estimation, and regression is also discussed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Simultaneous Classification and Feature Clustering Using Discriminant Vector Quantization with Applications to Microarray Data Analysis

In many applications of supervised learning, automatic feature clustering is often desirable for a better understanding of the interaction among the various features as well as the interplay between the features and the class labels. In addition, for high dimensional data sets, feature clustering has the potential for improvement in classification accuracy and reduction in computational complex...

متن کامل

A Method to Reduce Effects of Packet Loss in Video Streaming Using Multiple Description Coding

Multiple description (MD) coding has evolved as a promising technique for promoting error resiliency of multimedia system in real-time application programs over error-prone communicational channels. Although multiple description lattice vector quantization (MDCLVQ) is an efficient method for transmitting reliable data in the context of potential error channels, this method doesn’t consider disc...

متن کامل

Fuzzy Clustering Approach Using Data Fusion Theory and its Application To Automatic Isolated Word Recognition

 In this paper, utilization of clustering algorithms for data fusion in decision level is proposed. The results of automatic isolated word recognition, which are derived from speech spectrograph and Linear Predictive Coding (LPC) analysis, are combined with each other by using fuzzy clustering algorithms, especially fuzzy k-means and fuzzy vector quantization. Experimental results show that the...

متن کامل

Image Classification via Sparse Representation and Subspace Alignment

Image representation is a crucial problem in image processing where there exist many low-level representations of image, i.e., SIFT, HOG and so on. But there is a missing link across low-level and high-level semantic representations. In fact, traditional machine learning approaches, e.g., non-negative matrix factorization, sparse representation and principle component analysis are employed to d...

متن کامل

Spectral-spatial classification of hyperspectral images by combining hierarchical and marker-based Minimum Spanning Forest algorithms

Many researches have demonstrated that the spatial information can play an important role in the classification of hyperspectral imagery. This study proposes a modified spectral–spatial classification approach for improving the spectral–spatial classification of hyperspectral images. In the proposed method ten spatial/texture features, using mean, standard deviation, contrast, homogeneity, corr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002